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Abstract 

Elfving's Theorem is a major result in the theory of optimal experimental design, which gives a 
geometrical characterization of c— optimality. In this paper, we extend this theorem to the case of mul- 
tiresponse experiments, and we show that when the number of experiments is finite, the c—,A—,T— 
and D— optimal design of multiresponse experiments can be computed by Second-Order Cone Pro- 
gramming (SOCP). Moreover, the present SOCP approach can deal with design problems in which 
the variable is subject to several linear constraints. 

We give two proofs of this generalization of Elfving's theorem. One is based on Lagrangian 
dualization techniques and relies on the fact that the semidefinite programming (SDP) formulation 
of the multiresponse c— optimal design always has a solution which is a matrix of rank 1. Therefore, 
the complexity of this problem fades. 

We also investigate a model robust generalization of c— optimality, for which an Elfving-type theo- 
rem was established by Dette (1993). We show with the same Lagrangian approach that these model 
robust designs can be computed efficiently by minimizing a geometric mean under some norm con- 
straints. Moreover, we show that the optimality conditions of this geometric programming problem 
yield an extension of Dette's theorem to the case of multiresponse experiments. 

When the goal is to identify a small number of linear functions of the unknown parameter (typically 
for c— optimality), we show by numerical examples that the present approach can be between 10 and 
1000 times faster than the classic, state-of-the-art algorithms. 

Keywords: Optimal design of experiments, Multiresponse experiments, c— optimality, 
A— optimality, SDP, SOCP, Convex optimization. 



1. Introduction 

An important branch of statistics is the theory of optimal experimental designs, which explains 
how to best select experiments in order to estimate a vector of parameters 0. We refer the reader 
to the monographs of Fedorov |Fed72] and Pukelsheim |Puk93] for a comprehensive review on the 
subject, and to an article of Atkinson and Bailey |AB01j for more details on the early development 
of this theory. 
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The importance of Elfving's Theorem (1952) |Elf52j . which was one of the first major improvements 
in the field of optimal design of experiments, was illustrated in many works |Che99l IDet93t IDS93t 
IHHS95t IStuTlt IStuOSj . This result gives a geometrical characterization of the experimental design 
which minimizes the variance of the best estimator for the linear combination of the parameters 
c^O. Namely, the optimal design can be found at the intersection of a vectorial straight-line and 
the boundary of a convex set referred as Elfving Set. When the number of available experiments is 
finite, Elfving set becomes a polyhedron, and so Elfving's geometrical characterization allows one to 
compute c— optimal designs by linear programming methods, see Harman and Jurik |HJ08j . 

An interesting case appears in the study of the design of experiments, when a single experiment is 
allowed to give simultaneously several observations of the parameters. This setting is referred in the 
literature as multiresponse experiments, and occurs in many practical situations. For more details on 
the optimal design of multiresponse experiments, the reader is referred to the book of Fedorov |Fed72j . 



In this paper, we extend Elfving's classic result (Theorem 2.1) to the optimal design of multire- 



sponse experiments, see Theorem |3.1[ and we show that the latter problem reduces to a Second-Order 



Cone Programming problem (SOCP), see Theorem 3.3 



Moreover, we show in Section 4.1 that this result generalizes the Elfving- type result of Stud- 
den |Stu71j . which characterizes geometrically the A— optimal design for the estimation of a multi- 
dimensional linear combination of the parameters. As a consequence, one may cast the problem of 
finding an A— optimal design on a finite regression range (for single- or multi- response experiments) 
as an SOCP. 

In contrast to the classic algorithms for the computation of optimal designs, the flexibility of 
mathematical programming approaches makes it possible to add constraints in the problem without 
additional effort. We will see that the present SOCP approach can indeed handle optimal design 



problems subject to multiple resource constraints, see Section [42 

The fact that the c— optimal design problem (on a finite regression space) has a semidefinite 
programming (SDP) formulation can be traced back to 1980, as a particular case of Theorem 4 
in Pukelsheim |Puk80] . Similarly, the classic A- and D- optimal design problems for multiresponse 
experiments can be formulated as SDP |VBW98] . Second Order Cone Programming (SOCP) is a class 
of convex programs which is somehow harder than linear programming (LP), but which can be solved 
by interior point codes like SeDuMi |Stu99] in a much shorter time that semidefinite programs (SDP) 
of the same size. Moreover the SOCP method takes advantage of the sparsity of the matrices arising in 
the problem formulation. Hence, this work shows that computing A- and c— optimal designs actually 
belongs to an easier class of problems, and makes it possible to solve instances that were previously 
intractable. 



While our proof of Theorem 34 is an extension of Elfving's original proof, it leaves unexplained why 
the SDP formulation actually reduces to an SOCP. We provide in Section |5] another proof of the present 
Elfving-type result based on Lagrangian relaxation, which explains why the complexity of this problem 



fades. The proof relies on Theorem 5.2 which shows that a certain class of semidefinite programs 
(packing programs with a rank-one objective function) have a rank-one solution. Theorem 5.2 appears 
to be a result of an independent interest, in relation with the study of semidefinite relaxations of 
combinatorial optimization problems, and is therefore the subject of the companion paper |Sag09 . 



We next show (Theorem 6.1) that T— optimality also admits a second order cone programming 
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representation, which gives another argument for saying that second order cone programming is a 
natural tool for handling experimental optimal design problems. If the experimenter wishes to estimate 
the full vector of parameters 0, the T— optimal design problem is trivial. However, if he is interested 
in a linear subsystem K^d, the T— optimal design problem is complicated and can be handled by 
SOCP. 

In Section [7| we consider a generalization of c— optimality which was originally proposed by 
Lauter |Lau74j in order to deal with the uncertainty on the model. In this approach (called 
5— optimality by Lauter), the objective criterion takes into account the variance of several estima- 
tors, balanced in a log term with coefficients which indicate the belief of the experimenter in each 
model. Dette |Det93j characterized by an Elfving-type result the 5— optimal designs. We show in 



Theorem 7A_, which is the main result of this section, that when the number of available experiments 
is finite, S*— optimal designs of multiresponse experiments can be computed efficiently by minimizing 
a geometric mean under some norm constraints. Moreover, we show in Theorem 7.3| that the op- 



timality conditions of this geometric program yield an extension of Dette's theorem to the case of 



multiresponse experiments. As a consequence of Theorem 7.1, we obtain a SOCP for D— optimality 



(cf. Remark 7.5). The results of this section are proved in appendix. 



This work grew out from an application to networks |BGS08j . in which the traffic between any 
two pairs of nodes must be inferred from a set of measurements. This leads to a large scale optimal 
experimental design problem which can not be handled by standard SDP solvers. In a companion 
work relying on the present reduction to an SOCP |SGB10j . we illustrate our method by solving within 
seconds optimal design problems that could not be handled by SDP. In addition, the constraints of the 
SOCP formulation involves the observation matrices Ai of the experiments, which happen to be very 
sparse in practice. This is in contrast with the information matrices involved in the SDP formulation 
(Mi = Ai), which are not sparse in general. The present SOCP formulation takes advantage of the 
intrinsic sparsity of the data, and the instances can be solved very efficiently. 

We present in Section [8] some numerical experiments for problems of the kind that arise in network 
monitoring, as well as classic polynomial regression problems. Our experiments show that the SOCP 
approach is well suited when the number of linear functions to estimate is small. For the case of 
c— optimality (only one linear function to estimate), we will see that solving the second order cone 
program is usually 10 times faster than the classic exchange or multiplicative algorithms, and up to 
2000 times faster than the SDP approach for problems with multiple constraints. 



Some results of this paper, including Theorem 3.1, were presented at the conference |SBG09j 



and the technical result justifying the reduction to a SOCP was posted on arXiv Sag09] . Shortly 



before the time of submission, Dette and Holland-Letz published an article in Annals of Statistics., 



in which Theorem 3.1 was established independently (Theorem 3.3 in |DHL09] ). Dette and Holland- 
Letz considered a heteroscedastic model (i.e. an experimental model where both the mean and the 
variance of the observations depend on the parameter of interest), which led them to study the case in 
which the observation matrices are of rank k > 2, just as in the model of multiresponse experiments. 
They used their geometrical characterization of the c— optimal design for heteroscedastic models in 
an application to toxicokinetics and pharmacokinetics. It should also be mentioned that the proof 
of Dette and Holland-Letz relies on an equivalence theorem (Theorem 3.1 in |DHL09] ). while ours 
is closer to Elfving's original approach, as done previously by Studden [Stu05] for other results in 



optimal design of experiments. The main result of this article (reduction to a SOCP, Theorem 3.3), 
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provides a new insight on the relations between these two approaches : they are actually dual from 
each other (in the Lagrangian sense). Indeed, the approach of Dette and HoUand-Letz corresponds to 
the primal SOCP ([s]), while our geometrical characterization corresponds to the dual SOCP (|9]), and 
strong duality holds between these two optimization problems. 

2. Preliminary results 

Before stating the main results of this paper, we introduce the necessary background and recall 
Elfving's classic result. Throughout this paper, we denote vectors by boldface letters and matrices by 
capital letters. We make use of the standard notation [n] = {1, . . . ,n}. The components of a vector 
V G are denoted by fi, . . . ,Vn- The L^— norm of v is denoted by and the Frobenius norm 
of a matrix A by \\A\\p = ^/trace (AA^). The notation v > means that every component of v is 



The most common model in optimal experimental design assumes that each experiment provides a 
measurement which is a linear combination of the parameters up to the accuracy of the measurement. 
Let X denote the set of available experiments. Every experiment x E X provides a measurement 



where is the m— dimensional vector of unknown parameters, and is the (m x l)-regression 
vector. Uncorrelated experiments are performed at different locations Xi, ...,Xs from the compact set 
X (possibly infinite), and the objective is to determine both the optimal choice of the Xi, and the 
number of experiments rii to be conducted at Xi ; we call such a subset of experiments a design. 

Rather than deciding the exact number of times that each experiment should be conducted, it 
has been proposed to work with approximate designs instead, which is simply done by releasing the 
integrity constraints on the Ui. In this setting, a mass indicates the proportion from the total number 
of experiments to be conducted for each available experiment. For example, if the weight for the i^^ 
experiment is Wi, and that N experiments are allowed, Nwi are chosen at Xi. This definition suggests 
that the vector of weights w is such that each quantity Nwi is integer. However, the continuous 
relaxation where every vector w summing to 1 is allowed is of theoretical interest. The approximate 
design where the percentage of experimental effort at xi is Wi is written as 



or ,^ = {xi.,Wi} for short. In this paper, we consider only approximate designs. 

The celebrated result of Elfving |Elf52j gives a geometrical characterization of the design (known 
as c— optimal) which minimizes the variance of an unbiased estimator for a single linear combination 
cFd of the parameters. In this paper, we are going to extend Elfving's Theorem to the multiresponse 
case. To this end, we consider the linear regression model 



where is an unknown m-dimensional parameter, y{x) is the /-dimensional measurement vector for 
the experiment at a;, A{x) is a / x m observation matrix, and e{x) is the error on the measurement. 
The matrix A{x) may eventually become rank deficient for certain values of x. This allows one to 



nonnegative. 






y{x) = A{x)6 + e{x), xeX, 
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handle the case in which the experiment at x gives only Ix < I measurements. In this case, we set 
I — Ix rows of the matrix A{x) to zero. 

The observations are uncorrelated and have a constant variance, which will be assumed to be one 
for the case of the presentation, i.e. 

yx^,X2 eX, x^^ X2, E(e(a;i)) = 0, E(e(a;i)e(a;i)^) = /, E(e(a;i)e(a32)^) = 0. 

In fact, one can always reduce to this case when the variance of the experiments is known, after a left 
scaling of the observation equations. 

When Ui — Nwi experiments are conducted at Xi, we denote by y{xi) the average of these 
observations: we have 'E{y{xi)) — A{xi)d, and Var(y(a;i)) = ^J. For the design ^ = {xi,Wi}, we 
denote by y{C) the aggregate vector of observations [^(aji)'^, ...^y{xsY'Y ■• ^"^^ by A{^) the aggregate 
observation matrix [A{x-if , ...,A{XsYY ^^at E(y(^)) — A{!^)d, and Var(y(^)) = -^A(iy), where 



A(w;) 



(2) 



with {I X Z)— identity blocks on the diagonal. If Wj ~ for some i G [s], we simply remove the 
measurement point Xi from ^. For ease of presentation, we get rid of the multiplication factor 1/N , 
since it does not affect the results on optimal designs. 

Assume now that an experimenter wishes to estimate the scalar quantity C = cp-O, that is to say 
that he wants to estimate a linear combination of the parameters. It can easily be seen that a linear 
estimator ^ = h^y{C) is unbiased if and only if A{^)^h = c. Thus, linear unbiased estimators for c^6 
exist as long as c G Range(A({)-^). We will say that the quantity ( = c^6 is estimable when there 
is a design ^ such that c G Range(A({)^). Notice that a sufficient condition which ensures that c^O 
is estimable for all c G is that the matrices {^A{x))^^^ contain m linearly independent vectors 
among their rows. For an estimable quantity c^6, we define the feasibility cone S(c) as the set of 
designs ^ such that span the vector c, and a design ^ will be said feasible if it lies in the 

feasibility cone. 

Now, let us assume that the quantity c^d is estimable. Our protagonist certainly wants to choose 
an unbiased estimator of C with minimal variance; the variance of ^ is equal to h."^ Var(y(^))/i = 

A(w)h, and we know from Gauss-Markov theorem that it is minimized under the unbiasedness 
constraint A{(fh = c for h = A(it»)"M(^)(A(^)^A(it»)"M(^))tc, where Aft denotes the Moore- 
Penrose pseudo inverse of M, and the variance of this best linear unbiased estimator is : 

Var(C) = c'iAiO^Aiwr'AiOyc = c^MiO'c, 

where M(^)~ is a generalized inverse of Af(^) = A^^y /\{w)^^ A{^) ^ i.e. any matrix G verifying 
M{^)GM{^) = M(^). Notice that this expression does not depend on the choice of the generalized 
inverse G, since there exists a vector u such that M{^)u = c: 

VG G M{^)-, c^Gc = u^M{i)GM{i)u = u^M{^)u. 

The (m x m)— matrix M(^) is traditionally called the information matrix of the design, and it can be 
written as a weighted sum of the observation blocks : 

s 

MiO-J^'^iAixiYAixi). (3) 

i=l 
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In a more general setting, w is replaced by a probability measure /i on A' 



M(0 = / A{xfA{x)dfx{x). 
Jx 

However, this continuous form of the information matrix is still a symmetric matrix from the closed 
convex hull of {A{xY^ A[x), x G A"}. When X is compact, and x ^ A{x) is continuous, the set of all 
information matrices {Aix)"^ A{x), x G X} is closed, and we know from Caratheodory's theorem that 
M{^) can be written as barycenter of m(m + l)/2 + 1 information matrices (see Fedorov |Fed72j ) . 
Therefore, the optimal design can always be expressed with a discrete measure n = Wi6{x — Xi) + 
... + WsS{x — Xs), where S is the Dirac measure and s < m{m + l)/2 + 1. We will consider only such 
discrete designs in this work. 

The c— optimal problem is to find the feasible design ^ = {xi^wi] minimizing the variance of the 
aforementioned estimator: 

min c^M{C)~c (4) 
CeH{c) 

s 

s.t. M(0 = ^w,A{xif A{xi) 
1=1 

s 

^^Wj = 1; V z G [s], > 0, a^i G Af. 

i=l 

We point out that Problem (|4]) is not limited to the case of multiresponse experiments; there are 
other situations where the information matrix takes the form of Equation ([s]) (with Z > 1), for 
example in the case of models with parametrized variance or in models with correlated observations 
(see e.g. |MP95l[PM] ). 

In the classic model (single response experiments), y{x) is a scalar observation, which means 
that / = 1 and A{x) = a^^ is a row vector. In this case, Elfving's Theorem gives a geometrical 
characterization of the c— optimal design. We first define the Elfving set as the convex hull of the 
vectors ±aa,: 

£ = convex-hull | ± aa, , x E X^, 
and we denote its boundary by dS. 

Theorem 2.1 (Elfving |Elf52j ). A design ^ = {xi,Wi} is c— optimal if and only if there exists scalars 
ei = ±1 and a positive real t such that 

s 

tc = ''^^Wieittx^ G d£. 

i=l 

Moreover, = C"^Af(^)~c is the minimal variance. 

Elfving's theorem shows that the c— optimal design is characterized by the intersection between 
the vectorial straight line directed by c and the boundary of the Elfving set S. We also point out that 
when the vector c is not spanned by the regression vectors {ax)xex, in other words when c^6 is not 
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Figure 1: Geometrical representation of Elfving's theorem in dimension two. The gray area represents the Elfving set, 
which is a polyhedron because X is finite (here, X = {f,2, 3,4}). The intersection t*c determines the weights of the 
c— optimal design: w* = [0,0, |, |]^. 

estimable (i.e. S(c) = 0), then the only scalar t such that tc lies in £ is 0, and so a c— optimal design 
does not exist. 

We show on Figure [l] a representation of Elfving's theorem in dimension 2. Here, X = {1,2,3,4} 
is finite, such that the Elfving set is a polyhedron. The vector c is along the ^i— axis, which means 
that the experimenter wants to estimate ( = 9i. The intersection between this axis and the Elfving 
set indicates the optimal weights of the c— optimal design: 1^3 = | and 1^4 = |. Note that since a2 is 
in the interior of the Elfving set, the experiment 2 is never selected, whatever is the vector c. This 
example also shows that the optimal design w* can be computed by linear programming (LP) when 
X is finite (intersection of a straight line and a polyhedron). This feature was noticed by Harman 
and Jun'k |HJ08j . who formulated the c— optimality LP: 

max t (5) 

t,w 

s.t. tc = eitti 

i 

-Wi<ei< Wi, \fi G [s] 
Wi = l,w > 0. 
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3. Extension to the case of multiresponse experiments 

In this paper, we extend Ehving's resuh to the case of muhidimensional observations, by defining 
an analog of the Elfving set for the multiresponse case: 

S = convex- huU [A{xfe, e G M', ||e|| < 1, xeX]. 



Following our proof, we further show in Theorem 3.3 that the c— optimal design of multiresponse 



experiments can be formulated as a Second Order Cone Program. 

Theorem 3.1 (Extension of Elfving's theorem for multiresponse experiments). A design = {xi, Wi} 
is c— optimal if and only if there exists a positive scalar t and vectors in the unit hall of W (i.e. 
^ such that 

tc = '^^WiA(xi)'^ei E d£. 

i 

Moreover, t^"^ = c^M{^)^c is the minimal variance. 

Proof. We consider an unbiased linear estimator for ( = c^6 : 

( = h^y{^), with h = [hi^, hs'^f e hi e M.K 

The unbiasedness property forces the following equality to hold : 

s 

i=l 

Now, the Cauchy-Schwarz inequality gives the following lower bound for the variance of ( : 

Var(C) = h^AMh = ^^> (X^ll^fcll)', (6) 

where and A(w) was defined in Equation ([2]). We recall that we assume w > without loss of 
generality, since an experiment with a zero weight can be removed from the design ^. 
We show that ^ G by writing: 

A^kW tell A^kW fell i A^kW fell {i:||hi||>0} 

where /ij = jjl^^y and ej = so that ||ei|| = 1, /i^ > and — ^■ 

Let t be a positive scalar such that tc E dS. The fact that ^ ^ |^^^|| G S implies 

l^kW fell 

Combining ^ and ([T]) leads to the lower bound for the variance of any linear unbiased estimator 
ofC 
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We will show that this lower bound is attained if and only if the design ^ satisfies the condition 
of the theorem. To do this, notice that for a design ^ and an estimator h^y{^) to be optimal, 
it is necessary and sufficient that the inequalities ^ and ([T]) are equalities. The Cauchy-Schwarz 
inequality ^ is an equality if and only if w is proportional to the vector ||/is||]^, i.e. 



\\hi 

Wi 



The second inequality ([T]) is an equality whenever ^ ^^^^^^ E d£, i.e. ^ ||^^|| = t, where t is the largest 
real such that tc E S. We can write 

d£ 3 tc = A{xifhi = ^ HiA{xi)^ei, 

i {i:\\hi\\>0} 

with /ij = and = jj^- We have ||e^|| = 1, and the equality conditions are satisfied if and 

only ii fii = Wi. □ 

Remark 3.2. The latter theorem has a simple geometric interpretation. In the scalar case, we have 
seen that the c— optimal design could be find at the intersection of a polyhedron and a straight line 
directed by c (see Figure [T]). In the multiresponse case, the generalized Elfving set is no longer a 
polyhedron: instead, we compute the intersection between the straight line directed by c and the set 

£ = convex-hull jAfej, i G [s], G M', ||ei|| < l}, 
= convex-hull [Si, i E [s]}, 

where £i is the ellipsoid with semi-axis \J A^*^w^' ''^ {k E H), where {aP,...,A^^} are the eigenvalues 

of AjAi and {^i^^ . . . , are the corresponding eigenvectors. In the common case, we have / < m, 
such that some eigenvalues of AjAi vanish and the ellipsoid Si is not full-dimensional (i.e. its volume 
is zero). We illustrate this geometric interpretation in Figure |2} Moreover, we see in next theorem 
that the intersection which characterizes the c— optimality can be computed by a Second order cone 
program. 

Theorem 3.3 (Computation of the c— optimal design by SOCP). Assume that the number of available 
experiments is s, so that X can be identified with [s]. Let u*, {iJ,*,h*) be a pair of primal and dual 
solutions of the second order cone programs: 

(P-SOCP): max c^u (8) 

We[s], \\AM\<i 

(D-SOCP): min J^;., (9) 

s 

i=l 

V^e[s], \\hi\\<fii. 
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Figure 2: In the multiresponse case, the generahzed Elfving set E is the convex huU of the ehipsoids Ei. On this picture, 
we have plotted the rows of the observation matrices: af^ is the j"^ row of Ai. In the (common) case where / < to, the 
vectors (ctij)jG[/i] ^'^^ boundary of the eUipsoid Ei (here, this is the case for Ei,E2, and £4, but not for £3 since 

= 3 > 2). Also note that when I < m, the ellipsoid Ei is not full dimensional (on the picture, we have I4 — 1 < 2, such 
that E4 is a segment). The intersection of the line directed by c and the generalized Elfving set (denoted by a brown 
circle on the figure) indicates the weights of the c— optimal design. Here, t*c is at equal distance of the two extremal 
points Xi e El and X3 e £3, so that the c— optimal design is ii; = [0.5, 0, 0.5, 0]"'". 



We define 

s 

w:=ti-i*, where t=C^^fi*)~^. 



i=l 

Then w is a c— optimal design. Moreover, ( = ^h*^yi is the best linear estimator of c^O, and the 
optimal variance is var((^) = = (Xlj/^i)^ = (c"^w*)^. 



Proof. This result is actually a corollary of Theorem As in the proof of the latter theorem, define 
t as the largest scalar such that tc G i.e. such that there exists Wi summing to 1 and vectors in 
the unit ball of M' satisfying 

s 

tc = y^^WjAjej. 

i=l 

This decomposition gives the optimal weights Wi and the best estimator of (: 

s 

C = Y,hi^yi^ (10) 



1=1 
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where hi = ^ei. According to the proof of Theorem 3.1 indeed, an unbiased estimator of the form ( 10 ) 
is optimal if and only if every hi is proportional to and has norm y-. Setting Zi = WiSi, one obtains 
t as the value of the following SO CP: 

max t (11) 

t,Zi,W 



i=l 



yi G [s], ||zi|| < Wi, 

y^^wj = 1, w >o. 

i 

In order to get an SOCP in the standard form, we write Wi = tfii, where t = ^ — is an arbitrary 
nonnegative scalar. Then, we set hi = t~^Zi, and we obtain a problem in the form of Finally, 
the value of (P — SOCP) and {D — SOCP) are equal, since the Slater condition holds for this pair of 
programs (the dual {D — SOCP) is strictly feasible and the primal (P — SOCP) is feasible). A proof 
of the strong duality theorem for SOCP can be found e.g. in jNN94j . Section 4.2. See jLVBL98] for 
more background on SOCP duality theory. □ 

The previous theorem shows how one can compute the c— optimal design on a finite regression 
region by solving an SOCP. This can be done with the help of interior points codes such as Se- 
DuMi |Stu99j . Solving the latter SOCP (|8]) is a much easier task than solving the former state- 
of-the-art SDP for c— optimality, because the number of variables is in the order of m (instead of 
m^); because we have get rid off the positive semidefiniteness constraint of the SDP; and because the 
SOCP solver is able to exploit the sparse structure of the observation matrices Ai (while the partial 
information matrices Mj = AfAi involved in the SDP are not very sparse in general. As stated in 
the introduction, this approach has been used for the computation of c— optimal designs for a net- 
work application |SGB10j . Very large instances of experimental design problems arise for the optimal 



monitoring of IP networks indeed, and we will see in Section [873] that the present SOCP can handle 
them. 



4. Two extensions of the main result 

In this section, we show that optimal designs can be computed by SOCP in two particular situations 
which generalize c— optimality, namely when the experimenter wants to estimate several quantities 
Ci^6, . . . , Cr^6 and chooses the A— optimality criterion, and when the design is subject to multiple 
resource constraints. 

4.I. The case of A— optimality 

When there are several quantities of interest, i.e. when <^ = {ci^d, . . . , Cr^O)'^ consists in a collec- 
tion of r linear combinations of the parameters (<^ = K^6, where K is the m x r matrix formed by 
the columns Ci), it is known that the best linear unbiased estimator is 

C = ir^(M(0)U(0^AM"'2/(0, 
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and its covariance matrix is 



Var(C) = {MiOrK. 



Notice that an interesting case occurs when K = I, i.e. when the experimenter wants to estimate the 
whole vector of parameters. As in the case of c— optimahty, we say that <^ = K^0 is estimable if there 
is a design ^ such that Ranged C Range 74(^), and we denote by '^{K) the feasibility cone (i.e. the 
set of designs ^ satisfying the latter range inclusion). 

The objective is now to minimize, in a certain sense, the variance of this best estimator. A 
widely used criterion is to minimize the trace of this matrix : such a minimizing design is known as 
A— optimal. 



min tTcice(K'^M(0~K) 



(12) 



s. t. M(0 = WiM^ifM^i) 

i=l 

s 

y^^Wj = 1; V i G [s], Wi > 0, Xi e X. 



i=l 



We show in this section that computing the A— optimal design for the parameter of interest K^0 
can be written as a c— optimal design problem with multidimensional observations. The objective 
function of (12) can indeed be written as 

r 

trace {K^M{0-K) = ^Cfc^M(e)-Cfc. 

fc=i 

We define the vector c as the vertical concatenation of the columns Ci, i.e. c = [ci^^, ...,Cr'^]'^. Now , 
we have: tmce{K'^ M{^)- K) = c^M{C,)~c, where: 



/ M(0 



M(0 



M{0 J 



E 



2=1 



Wi 



A{xifA{xi) J 



E 

2=1 



Wi 



I A{xi) 



\ 



\ 



A{xi 



A{xi) ) 



\ 



A{xi) I 





1 



= Y^w,A{xiYA{xi). 

i=l 

In the latter equation, A{xi) contains r blocks and is of dimension rl x rm. We can now rewrite 
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Problem ( 12 ) in the following form: 



mm trace (c^M(^)^c) 



s. t. M(0 = Yl WiA{xifA{xi), 

i=l 

s 

Wi = 1; W i E [s], Wi > 0, Xi E X . 



i=l 



We have thus shown that the problem of finding the A— optimal design is nothing but a c— optimal 
design problem, with augmented observation matrices A{xi). As a consequence, the results of Section|3] 
on c— optimality also apply for the more general A— optimal design problem for a subsystem K^O of 
the parameters. In particular, the A— optimal design problem reduces to an SOCP: 

Theorem 4.1 (Computation of the A— optimal design by SOCP). Let (f/*, (/^*, {H*)i(z[s])) be a pair 
of primal and dual solutions of the second order cone programs: 



max trace K^U (13) 



mm 



Wi e [s], \\AiU\\F < 1 

i 

k = J2aJh^ 

i 

\/i G [s], \\Hi\\F < lii- 

s 

w=ti-i*, where t=(Y^fi*)~^. 

i=l 

Then, w is A— optimal for K^O. Moreover, C, = Ylii'^i)'^yi ^■^ ^^^^ linear unbiased estimator of 
K^O, and the optimal A— criterion is 



We define 



7 

Yci^M{w*)-Ci = t-^ = {J2f^*f = (trace K^U*'^ 

i=l 



We further show that the geometrical characterization of c— optimality for multiresponse experi- 
ments generalizes the result of Studden |Stu71] , who established an Elfving-type result for A— optimal 
designs of single-response experiments (^ = 1 and A{x) = is a row vector). This characterization 
is based on the following extension of the Elfving set when the matrix K is m x r: 

£s = convex-hull{a^e^|3; eX, ee W\ \\e\\ < 1} C M™^'' 



13 



Theorem 4.2 (Studden,1971). A design ^ = {xi,Wi} is A— optimal for K^O if and only if there 
exists a scalar t > and vectors ei in the unit ball of W such that 



tK = ^Wia^.eJ E dSs- 

i 

Moreover, t~^ = tTace{K^M{^)^ K) is the optimal value of the A— criterion. 



One can easily verify that this theorem is a particular case of Theorem |3.1[ Using the previously 
introduced notation indeed, Theorem 3.1 says that ^ = {xi,Wi} is A— optimal for K^6 if and only if 



there exists a scalar t > and vectors in the unit ball of W'- such that 

tC = ^ 9S, 

i 

and we notice that c is the vectorized version of K, and when / = 1, £^ is the vectorized version of £s 
and A{xi)^ei = [e-nax^'^, ■ ■ ■ , ^isO-xJ^Y is the vectorized version of ax^^i^. 

4-2. Optimal designs subject to multiple resource constraints 

The great advantage of the mathematical programming formulations (LP,SOCP,SDP,...) resides 
mostly in their flexibility, and the possibility to add "without effort" new constraints in the problem. 
Elfving studied the case in which the available experiments have different costs |Elf52j . If the cost of 
the i^^ experiment is Pi, and the experimenter disposes of a budget b, the constraint becomes: 



WiPi < b. 

Now, Wi can not be interpreted as the percentage of experimental effort to spend on the i^^ experiment 
anymore. Instead, the quantity should be seen as the percentage of budget to allocate to the 
experiment i. Elfving noticed that the change of variable w[ = brings the problem back to the 
standard situation, and is equivalent to a scaling of the observation equations ([T]). 

Consider now the more general case in which it> is a control variable for the experiments, such that 
the information matrix takes the standard form M{w) = X]i=i WiAfAi for some observation matrices 
Ai. We assume that w is constrained by several linear inequalities 

Rw < b, (15) 

where b G M", -R is a n x s matrix and the inequality is elementwise. Contrarily to the previous situation 
with a single budget constraint, there is no simple change of variable which brings the problem back to 



the standard case Wi = 1), because we do not know which inequalities will be saturated in (15) at 
optimality. This constrained problem has been studied by Cook and Fedorov |CF95] . who proposed 
an extension of the Fedorov exchange algorithm. However, the latter exhibits a slow convergence in 
practice. 

This constrained framework arises in the problem of optimally setting the sampling rates of a 
measuring device on a network. In this case, w is the vector of the sampling rates of the monitoring 
tool at different locations of the network, and the constraint Rw < b reflects the fact that only a 
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certain number of packets should be sampled at each router |SGB10] . Another example of appli- 
cation of optimal design problems with multiple constraints was given by Vandenberghe, Boyd and 



Wu |VB W98] : they have shown that constraints of the form (15) can be used to avoid concentrated 
designs. For example, we can impose that no more than a given fraction of the experimental effort, 
say 90%, is concentrated on a small number of experiments, say 10% of the possible observations. 

Adding those multiple resource constraints in the SDP formulation of the c— optimal design prob- 
lem is straightforward. Doing the same thing for the SOCP formulation is a little more tricky, since 
some hyperbolic constraints need be reformulated as conic constraints. We do this in the next theo- 
rem. A related result was obtained by Ben-Tal and Nemirovskii |BTN92j . for an application to truss 
topology design (see also |NN94l ILVBL98] ). Our statement of the theorem shows that the optimal 
variables of the SOCP moreover give the best estimator of c^6, and that the result holds even if the 
optimal information matrix is singular (but contains c in its range). 

Theorem 4.3. The following SOCP is feasible if and only if c^6 is estimable for a feasible design 
(i.e. 3w > : Rw < b and c G Range M(iu) ). 



mm 

w>0, /x> 



in y^/Ltj 



I i=l 



(16) 



i=l 

Rw <b, w > 



2hi 

Wi - fli 



< Wi + fii, {i = l,...,s). 



then w* is c~ optimal (in the sense 



If the latter SOCP moreover admits a solution {w*, fi* 
of the general problem with constraints Rw < b), the best unbiased linear estimator of ( = c^6 is 
( = h*^yi, and the optimal variance is var((^) = c^M{w*)~c = X]i=o/^i ■ 

Proof. Let w he a feasible design {Rw < b, c & Range M(it>)). The Gauss Markov Theorem allows 
us to rewrite the objective criterion of the c— optimal design problem as: 



c M{w) c = min h A{w)h 



(17) 



s.t. [A^,...,A^]h = c, 



where A(w) is defined in Equation (|2]), and the optimal vector h in this problem defines the best 



linear unbiased estimator ( = h^y{^) of ( 
expression h^A(w)h can be rewritten as 



c 6. Decomposing h as [hi'' 



h, 



TiT 



hi e 



the 



E 



\hi 



(18) 



Recall that when an experiment is unobserved [wi = 0), it could simply be removed from the support 
of the experimental design. In other words, the sum (18) is taken on the indices such that Wi > 
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only. We can now rewrite the c— optimal design problem with constraints Rw < b in a form that 
involves the vector of coefficients h of the estimator (: 

\h,P 



mm 

iv, (h,iGMO,e[, 



s.t. 



Wi 



Clearly, this is equivalent to: 



mm 



S.t. 



E 

{i:w^>0} 
s 

i=l 

Rw <b, w> 0. 



s 
i=l 

±Ajh 

1=1 

Rw < b, 
\\hA\' 



(19) 



(20) 



since we can assume without loss of generality that Wi = ^ = fXi = 0. Finally, the SO CP (16) 
is obtained by reformulating the hyperbolic constraints < a(3 as 



2z 

a — 13 



< a + 



We now show that Problem (20) is feasible if and only if the quantity c^6 is estimable for a feasible 
design. We first assume the contrary, i.e. Rw < b, w > ^ c ^ Range M{w). Let w satisfies the 
inequality constraints; we denote by X = {ii, . . . ,ip} the subset of indices such that Wi > 0, so that 
we have 

Range M{w) = Range [A^, . . . , AjJ. 

Now, if Yli=i ^I^^i — ^ f*^^ some vectors hi, it implies that hi ^ for an index i such that Wi = 0, 
and the hyperbolic constraint H^iP < fJ^iWi is violated. Conversely, let w he a. feasible design, and 
X = {ii, . . . ,ip} defined as above. Since c G Range M{w), we can write J2k=i ^Ik^ik — ^ some 



vectors h 



hi . Finally, we define /Xj 



for all i G X, and h^ 



0, 



indices j G [s] \ X: Problem (20) is feasible for the vectors w, fi and {h 



Hj = for all the other 
i)i(z^s\- However, we point out 
that Problem (20) (as well as the c— optimal design problem with multiple constraints) may fail to 
have a solution when the polyhedron {w > : Rw < b} is not bounded, because in some cases the 
minimum is not attained and the infimum can be approached by a sequence of designs (tt>t)teN such 
that \\wt\\ — > oo. □ 



t—>-+co 



5. Alternative approach via Lagrange duality 

The aim of this section is to prove directly the equivalence between Problem ^ and the Second 
Order Cone Programs ([8])-(|9]) by use of Lagrangian duality tools, when the number of available exper- 
iments is finite (or equivalently that the locations of the selected experiments are given in advance). 
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As stated in the introduction, the motivation for this work is to understand why the former SDP 
approach of this problem actually reduces to a much easier problem. Moreover, our proof handles the 
multiresponse case in the same manner as the single-response case, by simply substituting observation 
row vectors with observation matrices in the constraints of the optimization problem, and it can be 
applied in a more general framework (cf. Section [?]). 

We first recall that the c— optimal design problem Ml) is equivalent to a semidefinite program. 



The c— optimality SDP already appeared in Pukelsheim |Puk80] , hidden under a more general form. 
The proof given in this paper is more elementary, and it can be adapted to handle S*— optimality (cf. 



a 
I jP] 



Lemma A.l). In the sequel, ^ denotes the Lowner ordering of symmetric matrices, that is, B y C 



if and only if i? — C is positive semidefinite. 

Theorem 5.1. When the regression range is finite (X = [s]), the c— optimal design problem Q is 
equivalent to the following pair of primal and dual SDP. More precisely, let {X*,fj,*) be a pair of 
primal and dual solution of the SDPs: 

(P-SDP) max c^Xc (21) 

s.t. tT&ce{AiXAj) < 1, Wie [s] 

(D-SDP) min ^/i^ (22) 

~ i 

S.t. fXiAf Ai y cc^ , 

i 

Then, the vector w = ^, is solution of the c— optimal design problem and the following relation 
holds: 

c^M{w)-c = ^^* = c^X*c. 

Proof. Since the feasible designs w must be such that c G Range M(iu), we can write, using a 
generalized Schur complement: 



t > c^M{w)-c 
t > 



M{w) 


c 




t 



y 0, 



for t E M.. Note that if we exclude the trivial case c = 0, we can always assume that t > 0. So, the 
c— optimal design problem (|4]) can be rewritten as: 



min t 

teR,w>o 



s.t. 



(23) 



M{w) 


c 




t 



y 0. 



Wi 



Now, we re-express the positive-semidefiniteness of the matrix in (23) with another Schur comple- 
ment: 



M{w) 


c 




t 



y 



M{w) y 

M{w) > t-^cc^ 
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Setting the new variable fi = tw, so that t = Ylif^i^ obtain a semidefinite program in the form 
of {D — SDP). Strong duahty holds between (D — SDP) and {P — SDP), because the Slater condition 
for semidefinite programming is satisfied (the primal problem (P — SDP) is strictly feasible, and its 
dual is feasible). Finally, notice that the dual feasibility condition '^■fiiAjAi >z cc^ implies that 
c is included in Range /XjAfAj = Range M{w), so that we do not have to worry about the the 
constraint ,^ G S(c) in the initial problem Q. □ 



We notice that when the above SDP has a rank one solution, it can be reduced to the primal SOCP (jSj). 
The next theorem shows that such a rank-one solution always exists indeed, and its proof can be 



found in |Sag09| . Note that this theorem is of independent interest. The general result establishes 
the existence of rank-r solutions for the class of semidefinite packing problems in which the matrix 
defining the objective function has rank r: 

Theorem 5.2 (Low rank reduction theorem for semidefinite packing problems |Sag09| ). Denote by 
Sm the space of m x m symmetric matrices, equipped with the inner product {A,B) := trace (Ai?-^). 
We consider the following semidefinite packing program 

max {K^K,X) (24) 

s.t. {AjA,X) < hi, V2G[s], 
X ^ 0, 

This problem is feasible if and only if every hi is nonnegative, and is bounded if and only z/ Range C 



Range (^^'^ y4fy4j). Under these two conditions, the SDP (24) has a solution X which is of rank at 
most r := rank K^K. 

In particular, if K = c is a vector ( i. e. rank K^K = 1), then there is an optimal variable in the 
form X = xx^ for some vector x G M™, and x is the solution of the following Second-Order Cone 
Program: 

max cFx (25) 



s.t. \\Aix\\ < \/bi, Mi G \s 



This theorem shows that if c is estimable, then Problem (21) admits a solution of rank-one 



and reduces to Problem ([8j). This is another proof of Theorem (3.3) which explains why c— optimal 
designs can be computed by SOCP. To our sense, this proof gives a better understanding on how the 
computing gap from SDP to SOCP is crossed : the intrinsic positive semidefiniteness of the data in 
the SDP ensures that there is a solution of rank 1, which explain why the complexity of the problem 
fades. 



Remark 5.3. The rank reduction theorem 5.2 admits a generalization, which is presented in Sag09 
and allows one to handle the case with multiple constraints Rw < b. 

Remark 5.4. The theorem above can be extended to a semi-infinite programming context. This allows 
one to write the c— optimal design problem over an infinite regression region X under the form: 

max c^u (26) 

u 

Wx G X, \\A{x) u\\ < 1. 
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Note that this formulation is given with a theoretic purpose only, since the resolution of this kind of 
problem is very hard, even in the case in which A{x) is linear in x. If some regularity conditions 
are satisfied by the application x H- A{x), then there is a stochastic algorithm which converge to the 



optimal of Problem (26) with probability 1 |TMT06j . but the convergence can be extremely slow. If 
we can solve this optimization problem, though, then the points x where the inequality constraint is 
active are nothing but the support of the optimal design. 

6. T-optimality for K^O 

When several quantities are of interest (optimal design for another classic criterion from 

the experimental design literature is the T— optimal design problem: 

sup ^t{0 ■= trace {K^M{w)-K)-^ (27) 

s 

s.t. X^""^* ~ V z G [s], > 0, a^i G X. 

i=l 

The criterion $r can be extended by continuity to the designs C, such that K'^M{w)~K is not invert- 
ible, i.e. such that ^ ^ '^{K). Contrarily to the A— optimality criterion, there are some examples where 
the maximum is attained for a nonfeasible design ^ ^ '^{K) (see Pukelsheim |Puk93] . Section 6.5). 
Following the terminology of Pukelsheim, we call a design formally T— optimal if it maximizes the 
objective criterion $t subject to the constraints Yli=i = 1 only, i.e. when we ignore the feasibility 
condition Range ii' C Range M(iu). Note that a formally T-optimal design always exists, because 
the feasibility region becomes compact when we remove the feasibility constraint. Pukelsheim shows 
in Section 9.15 of |Puk93] that the T— optimal design problem for the full parameter 6 is trivial: if 
is estimable {K = I), then a design is formally T— optimal for if and only if it allocates all the 
weight to the experiments i such that II^iHf is maximal. However, when the quantity of interest is a 
parameter subsystem <^ = K^0, K I, the problem becomes computationally challenging. 

We show in this section that it is possible to compute a formally T— optimal design for K^0 with 
a SOCP when X is finite. This gives another example of optimal experimental design problem which 
can be handled via Second Order Cone Programming. 

Theorem 6.1 (T-optimality SOCP). Let X = [s] be finite, K^0 he an estimable quantity, and 
((t*, [/*), (Z*, it>*, 7*)) be a pair of primal and dual solutions of the second order cone programs: 

min t (28) 
K^U = I 

\/i G [s], \\A,U\\l < t 



max 



(trace Z^ + Y^n) (29) 



i=l 

s 



i=l i=l 

Wi G [s], \\Zi\\p < iwai. 
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(Note that these are Second Order Cone Programs indeed; we have let the hyperbolic constraints to 
simplify the notation, otherwise the matrices AiU and Zi need be vectorized). Then, w* is formally 



T— optimal for K^6, and the value of the supremum in Problem (27) is t* = — (trace(ZQ) + X]j7i*)- 
If moreover Range C Range M(it>*), then w* is T— optimal. 

Proof. Our proof relies on the general definition of the information matrix for K^0, which is given 
by Pukelsheim in Chapter 3 of |Puk93j : 

Ck(0 ■= min ^ U^M(i)U (30) 
s.t. K^U = I. 

In the latter expression, the minimum is taken with respect to the Lowner ordering. Pukelsheim shows 
that the minimum exists indeed (which is not obvious since the Lowner ordering is a partial ordering), 
as a consequence of the Gauss-Markov Theorem (cf. Theorem 1.21 in |Puk93] ). and moreover that 
Ck{0 coincides with the standard expression {K'^ M {w)~~ K)~^ for feasible designs ^ = {xk,Wk} G 

Now, since the trace of a matrix preserves the Lowner ordering, we can express the (formal) 
T— optimal design problem as the following saddle point problem: 

max trace C/^(it>) = max min traceU^ M(w)U 

i">o. E,"'i=i -^yo, E,"'i=i U: KTu=ir 



max min > 

W>0, Y.^Wi = l U: KTU = Ir ^ 



min max \\AiU 



i=l 

|2 



The exchange of the max and the min above is a consequence of Sion's minimax theorem {{w, U) 

is continuous, concave in w and convex in U). We next introduce a variable t which 
must be larger than all the quantities ||y4jf/|||., and we have shown that the (formal) T— optimal design 



problem for K is equivalent to Problem (28). The (formal) T-optimal design w* is the optimal 



dual variable corresponding to the hyperbolic constraints in Problem (28). It follows that w* can 



be computed by solving the dual optimization problem (29). Finally, the value of these optimization 



problems is the same by Strong duality (Slater condition holds), and is equal to the optimum of the 



T— optimal problem (27). 

□ 



7. Convex optimization and S— optimality 

The S'-criterion was introduced by Lauter |Lau74j in order to tackle the uncertainty of the exper- 
imenter on the true model, by considering a class of r plausible models with means 

in which the quantity to estimate is (k = c^^O (Wk G [r]). 

In other words, the measurement y{x) at x is modeled as a linear function of the parameter 0, 
which depends on the model, and must be used to estimate a linear function ( of the parameter in 
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each model. In practice, the parameters of each of these models may be different. This can be handled 
by setting the j^^ column of A(^k),x to whenever the k^^ model at x does not depend on 9j. Note 
that we write the index of the model in parenthesis, in order to avoid ambiguities with the index of 
the experiment. 

Given a nonnegative vector f3 of size r with sum 1, where indicates the importance that 
the experimenter attaches to the model k, or the importance of the linear combination CfJ^O, the 
5*^— criterion is: 

r 

Sf3iO = ^/3fclog(cfc^M(fc)(e)-Cfc), 



k=l 



where 



Mik){0 



A 



is the information matrix in the k^^ model. A design minimizing this criterion is called S'/g— optimal. 
An interesting case occurs when the s models are identical. This is an alternative approach to the 
A— optimality for K^0, with weightings on each linear combination c^^O to be estimated. Dette 
studied the difference between these two approaches in Section 4 of |Det93j . 

We are next going to show that the S/j— optimal design of multiresponse experiments reduces 
to the problem of maximizing a weighted geometric mean under norm constraints. This is of great 
interest for the computation of S'/j— optimal designs. Indeed, this optimization problem is a geometric 
program, and so it can be reformulated in a form for which a self-concordant barrier function is known, 
and it can be solved efficiently to the desired precision by interior point techniques (see e.g. |BV04j ). 

Theorem 7.1. Let {t, {vik),w) be a solution of the following optimization problem. Then, w also 
minimizes the Sf^— criterion. Moreover, the value of this program coincides with the value of its dual, 
which we give below. 



min Sfiiw) 

w>0,Y^^ Wi=l 



2 min log(tfc) 

t,{vik),w ^-^ 
s 

tkCk = ^Aj,^-^ -Vik, Vk e [r]. 



i=l 



< 1. 



Vz G [s], 



i=l 



2 max 2, Pk log 

k=l 

A(l),^h^/^//3l 
A(^r),ihr/\/~i%' 



< 1 



Wi e [s]. 
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The variables of the primal optimization problem are w G M"^ (the design), t E W and the vectors 
Vih G M', for i G [s\ and /c G [r]. The variables of the dual problem are the vectors hi, . . . ,hr E M"^. 

Remark 7.2. The primal problem (P^ ) is equivalent to maximizing the geometric mean ni=i ^k'' under 
the same constraints, by monotonicity of the log function. Therefore, if /3 is rational and the least 
common denominator of the is g < 2^, we can reformulate Problem (P^) as a SOCP by introducing 
less than 2^^ additional norm constraints (cf. Section 6.2.3 of |NN94j or [LVBL98] ). 



The proof of Theorem 7.1 relies on a series of reformulations of the the Sf3— optimal problem, 
relying on Lagrange duality techniques and Theorem 5.2 We will prove this result in appendix. 

Then, we will show that the optimality conditions of the present convex optimization problem 
can be interpreted as geometrical conditions, which yields a generalization of the theorem of Dette 
for S"— optimality to the case of multiresponse experiments |Det93j . This geometrical characterization 
relies on the following Elfving-type set: 



V, 



convex— hull 



v 



^{r),£c 



xeX, efcGM^ J]/9A:||efc||' < 1 

fc=i 



C 



Theorem 7.3 (Geometrical characterization of multiresponse 5^3— optimality). The design w is 
S/3— optimal (and solution of Program {Pfi)) if and only if there exists a vector t G W and vectors 
G M' (i E [s],k E [r]), such that 

(i)\/iE[s], El=iPk\\eikf <l 











\ 


(ii) Diag(t)C = 












^ tj.Cf j 




T A 


/ 



(Hi) Diag(t)C lies on the boundary ofV/^, with a supporting hyperplane whose normal direction is 
given by the matrix H = [hi, . . . , hr]'^, with h^ E . In other words, 



D eV^=^ {H,D) <1 



(iv) H satisfies the equalities 

hhjck = (3k, yk E [r]. 

In this case, the optimal variables of Problems (Df^) and (Pfs) are t, := WiSn. (Vi G [s], V/c G 
[r]), and (/ifc)fce[T-]) so that the optimal Sf^ — criterion is — 2 ^^^j^ log(tfc). 

Theorem |7.3| is established in appendix. 

Remark 7.4. As in the case of single response experiments |Det93] . the geometrical characterization 
remains true when the regression range X is infinite. It can also be shown with semi-infinite program- 
ming techniques that the following convex semi-infinite program is valid for the general 5^— optimal 
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design Problem: 



min Sf3{^) = 2 max (3k log 



k=l 



A{r),xhr/ yWr 



< 1. 



Remark 7.5. Dette showed in |Det93j that D— optimality for the full parameter is a particular case 
of S"— optimality. As a consequence, we can formulate the D— optimal design problem as a convex 
optimization problem in the form of (-P^). To see this, Dette considered the virtual nested models, 
where the parameter of interest in the k^^ model is 9k^ and the observations only depend on the first 
k parameters: is the matrix Ai restricted to its first k columns, so that M(fc)(iu) is the upper 

left k X k submatrix of M{w), and = [0, 0, 1] is a vector of length k. Using the relation 

^ /X / / X i\ det M(u_-i)(w) 



detM(fc)(i(;) 



it can be seen that 



Si 



li/m,...,i/m]i'w) = -—logdet M{w) 



which is exactly the D— optimality criterion. 

Theorem |7.1 can now be used to formulate the D— optimal design problem as: 



max — logdetM(iu)= 2 max logffTTt,) 

tkCk 



l/m 



Z2 ^Jk),i''^ik, 



Vil 



< w 



VA; G [m] 
Vz G M, 



(31) 



i=l 



As pointed out in Remark 7.2, this optimization problem can be reformulated as a SOCP by intro- 
ducing n additional norm constraints, where n is the smallest power of two that is greater or equal to 
m. 



8. Numerical Experiments 

In this section, we evaluate the benefits of the SOCP approach for the computation of optimal 
experimental designs. We will see that the second order cone programs presented in this paper are 
very efficient when the number r of quantities to estimate is small (in particular for c— optimality). 
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We will compare the SOCP approach approach to the semidefinite programming/MAXDET ap- 
proach |VBW98] ■ and to other classic algorithms from the experimental design literature. We concen- 
trate on Wynn-Fedorov-type exchange algorithms, and Titterington-type multiplicative algorithms. 
The former were discovered independently by Wynn |Wyn70| and Fedorov |Fed72j , and consist in mov- 
ing at each step toward the experiment with the largest directional derivative. Several versions and 
refinements of this procedure were proposed; we will use the IncDec procedure of Richtarik |Ric08j . 
which specifies step lengths for which the precision 6 is achieved in 0(1/5) iterations. 

On the other hand, multiplicative weight algorithms were introduced in 1976 by Tittering- 
ton |Tit76j . The monotonic behavior of any sequence generated by this algorithm was recently proved 
by Yu |YulOj . for a large class of design criteria, including A— and optimality. We will also con- 
sider a variant of the latter algorithm, for which Dette, Pepelyshev and Zhigljavsky |DPZ08j have 
established a convergence result in the case of D— optimality, and conjectured the convergence for 
other criteria. These multiplicative algorithms use respectively a power parameter A g]0, 1] and an 
acceleration parameter 7 G [0, 1]. We found that the values A = 0.9 and 7 = 0.9 gave the best results 
for A— optimality in our experiments, and so those values will be used throughout this section. For 
Z^— optimality, we have used the acceleration parameter 7 = 0.5. 
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SOCP (13) 
[this paper] 


SDP 


IncDec Exchange 


Accelerated Mult. 


Mult. Weight with 


IVBW98] 


jRic08| 


Weight (7 = 0.9) |DPZ08j 


Exponent A = 0.9 |YulO| 


2 


0.082 


2.897 


10.039 


3.026 


2.979 


22 


0.120 


3.017 


99.510 


9.598 


9.240 


23 


0.166 


4.798 


13.112 


5.883 


6.040 


24 


0.175 


6.828 


24.431 


12.574 


12.204 


25 


0.352 


15.820 


29.454 


11.258 


11.123 


26 


0.816 


66.281 


54.379 


13.407 


13.419 


27 


2.636 


338.669 


92.537 


37.935 


36.679 


28 


10.496 


failed 


202.509 


96.594 


99.751 


29 


44.689 


failed 


412.890 


585.619 


597.442 


210 


154.187 


failed 


498.616 


551.634 


539.130 



Table 1: CPU time (s) of the different algorithms, for typical random instances of the A— optimal design problem with 
s = 2^°, Z = 1, r = 3, and different values of m. 

We will first consider random instances of optimal design problems, in order to evaluate to which 
extent each parameter affects the computation time. Then, we will consider a simple polynomial 
regression model, for which we shall see that the SOCP approach is well-suited when the number of 
support points is large. Finally we will present some results from a network application - which is at 
the origin of this work- where the sampling rates of a monitoring tool should be optimized subject to 
multiple constraints. 

8.1. Random instances 

In this section, we consider random instances of optimal experimental design problems, in which 
the entries of the / x m matrices {Ai)i^[s\ follow a normal distribution, as well as the entries of 



the m X r matrix K. For every considered instance, we use SeDuMi to solve the SOCP (13) and 
the y4— optimality SDP |VB W98] : we have implemented the other procedures in Matlab. In all our 
experiments, the stopping criterion is based on the general equivalence theorem of Kiefer |Kie74j : the 
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Figure 3: Comparison of two algorithms on random instances (A— optimality) with m — 120, I = 30, r — 1, and varying 
s. The box plots represent the distribution of the computing times for 10 random instances. 

computation stops as soon as the ratio between the maximum of the gradient and the value of the 
criterion is below 1.001 (as in |DPZ08j ). 

We start by evaluating the effect of r, which turns out to be the determining factor for the 
performance of the SOCP approach. To this end, we set m = 75, s = 150, / = 1 (single-response 
experiments), and we let r vary between 1 and 75. The computing time of the different algorithms is 
plotted against r in Figure |4j We notice that the SOCP is the fastest for small values (r < 7), but 
performs badly when r is large, while the multiplicative weight algorithms are insensitive to the value 
of r. For this reason, we will chose small values of r in further experiments, since the SOCP approach 
might not be well adapted for large r. 

We next study the effect of s (the number of available experiments) for the case of c— optimality 
(r = 1). For these experiments, we set m = 120, / = 30, and we take s in the set {2*^, k = 2, . . . , 11}. 
The performance (in terms of CPU time) of the SOCP is compared to that of the accelerated mul- 
tiplicative algorithm (for 7 = 0.9, |DPZ08j ) on the log-log plot of Figure |3} The boxes represent 
the distribution of the CPU time, on 10 randomly generated instances. We see here that the SOCP 
approach is in average ten times faster as soon as s > 32. 

To evaluate the effect of m, we set s = 2^", Z = 1, r = 3, and choose m in the set {2\k = 1, . . . , 10} 
(Note that since / = 1 and K is randomly generated, we must have s > m for the instance to be 
feasible). The results of each algorithm are displayed in Table [TJ It is striking that the SOCP 
approach is the best one, while the SDP is the worst when m becomes large, which demonstrates the 
importance of the rank reduction discussed in Section [sj For m < 2^, the SOCP is 10 times faster 
than all other algorithms. In the last row of the table however, this ratio is lower. This might be 
because s = m = 2^° in this case, such that all experiments are support points of the optimal design, 
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Figure 4: Comparison of four algorithms on typical ran- 
dom instances (A— optimality) with m =^ 75, s ~ 150, 
I = 1 and varying r. 



Figure 5: Comparison of four algorithms on typical 
random instances of the minimum covering ellipsoid 
(D— optimality for 0, to = 3) and varying s. 



and classic algorithms certainly take advantage of this situation (while it does not make a difference 
for interior point codes). 

Pronzato |Pro03] has shown that we can improve the multiplicative algorithms thanks to a simple 
test which allows to remove on the fly experiments which do not belong to the optimal design (i.e. 
with a zero weight), and which was refined by Harman and Pronzato |HP07] . This can considerably 
improve the performance of the multiplicative algorithms when there are a lot of points with a zero 
weight. As in |HP07j . we have studied random instances of the minimum covering ellipse, but in M?: 
m = 3, K = I3, and we draw s independent random regression vectors (/ = 1) from a normal distri- 
bution ai ~ A/'(0, 1 3), with s increasing from 50 to 500. The D— optimal design problem is equivalent 
to finding the minimum volume ellipsoid which contains the s vectors a^, and the D— optimal design 
is supported by points lying on the boundary of this minimal ellipsoid. In accordance with intuition, 
the number of support points of the Z)— optimal design is small, and therefore the test of Pronzato 
and Harman improves dramatically the computing time (cf. Figure [sj. Note however that the SOCP 



for D— optimality (31) remains competitive with the latter approach. 



8.2. Polynomial Regression 

We have computed the A— and D-optimal designs (for the full parameter 6), for a polynomial 
regression model of degree 5: 



A{x) = [1 



on the regression region X = [0, 3]. The optimal designs are represented on Figure [6] In this problem, 
we have r = m = 6, which is small. Therefore, we can hope that the SOCP approach will perform well. 
The computation times are plotted on Figures [7] and [8} as a function of the number of points considered 
for the discretization of the regression interval X = [0, 3]. For the A— optimal design, the experimental 
setting was the same that the one of previous section. For the D— optimal design problem, we 
solved the geometric program ^ with SeDuMi. We have also implemented the classic multiplicative 
algorithm, the accelerated algorithm with 7 = 0.5, and the MAXDET program |VB W98| . Contrarily 
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to the multiplicative algorithms, the interior point algorithms (SOCP and MAXDET) seem to be 
insensitive to the size of the discretization grid. For these instances, the SOCP is roughly two times 
faster than the MAXDET program. Also note that the effect of the acceleration parameter 7 is clearly 
visible (red curve vs. green curve). We point out that for these polynomial regression problems, the 
tests of Pronzato and Harman |Pro03t IHP07j to remove points that do not belong to the support of 
the optimal design did not yield any improvement. 



0.85 

0.3 
0.25 - 

0.2 
0.15 

0.1 
0.05 

°0 



-A-optimal 
-D-Optimal 



0,5 



1,5 
X 



2.5 



Figure 6: A- and D-optimal designs for the polynomial regression model of degree 5 on X ~ [0,3]. 




Figure 7: A-optimal design for the polynomial regression 
model: evolution of the computation time with the num- 
ber of points for the discretization of [0, 3]. 



Figure 8: D-optimal design for the polynomial re- 
gression model: evolution of the computation time 
with the number of points for the discretization of 
[0,3]. 



8.3. Optimal Sampling in IP networks 

We finally show some results for an application to the optimal monitoring of large IP networks. 
Assume that an Internet provider wants to estimate the traffic matrix of his network, that is, the 
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volume of traffic between each pair of origin and destination during a given time period. To this 
end, he disposes of a monitoring tool, which can be activated at different sampling rates in different 
location of the network, and is able to find the destination of the sampled packets. For networking 
issues, the intensive use of this tool is not suitable, because it creates an overload both in terms of 
CPU utilization of the router and bandwidth consumption. The sampling rates should therefore be 
tuned cautiously on each interface, in such a way that the number of sampled packets remains under 
a target threshold. 

This situation can be represented by an optimal design model with multiresponse experiments: 
the set of available experiments X coincides with the interfaces of the network where the monitoring- 
tool can be activated: when the software is installed on a given interface, we obtain an estimation of 
the sum of the flows that traverse this interface, and that have destination D, for every destination 
D reachable from this interface. In |SGB10] . two coauthors and I have shown that if the sampling 
rates are small, then the Fisher information matrix of the sampling design has the standard form (|3]) 
(after an appropriate normalization of the observation matrices relying on a prior estimate of the 
unknown OD traffic matrix). The optimal monitoring problem can thus be formulated as an optimal 
experimental design problem with multiple resource constraints. 

We ffist study some c— optimal sampling problems with the simple constraint ^^=1^4 = 1, so 
that we can compare the present SOCP approach to classic algorithms. Table [2] summarizes the 
results (in terms of CPU time) for several problems: each instance is defined by a network and the 
type of interfaces considered. We used the topology of three networks: Abilene, which consists in 11 
nodes, m = 121 OD pairs and 50 links; the Opentransit backbone of France Telecom, with 116 nodes, 
m = 13456 OD pairs and 436 links; and a clustered version of the latter network, thus reduced to 
31 nodes, m = 961 OD pairs and 133 links. The natural problem is to activate the monitoring tool 
independently on each link (interface= "links" ) . However, we also considered the academic problem of 
imposing the same sampling rates on all incoming links of each router, which is equivalent to consider 
each router as a big interface (interface= "Nodes"). For all these instances the vector c was drawn 
from a normal distribution. The threshold for the stopping criterion was lowered to 1.01 for this 
network application, since this value suffices to obtain good designs in practice. 
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SOCP 




0.021 


0.036 


0.078 


0.094 


5.52 


33.03 


SDP 




1.095 


1.178 


692.37 


734.25 


failed 


failed 


IncDec Exchange 


0.518 


0.823 


4.57 


19.69 


failed 


failed 


Mult, algo (7 = 


0.9) 


0.009 


0.043 


0.018 


1.893 


failed 


failed 


Mult, algo (A = 


0.9) 


0.008 


0.038 


0.018 


1.468 


failed 


failed 



Table 2: CPU time (s) for different instances of c— optimal design arising from an optimal monitoring problem in IP 
networks (with the standard constraint Wi — 1) 

We can see in the table that the multiplicative algorithms perform better than the SOCP approach 
on the instances where s is small (1st and 3rd columns in Table [2]). On the other instances however, 
the SOCP performs well, and it is the only method which returned a solution for the Opentransit net- 
work. The SDP and the multiplicative methods failed because of memory issues (in the multiplicative 
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algorithm, a full rank update of the 13456 x 13456 information matrix should be carried out at each 
time step). The IncDec Exchange algorithm did not crash, but it had not converged after 2 hours of 
computation. 

We next turn to the case of general constraints of the form Rw < b. Since we do not know 
any other algorithm which can handle optimal design problems with multiple resource constraints, 
we compare the SOCP and the semi-definite programming approaches only. Table [3] summarize the 
results (in terms of CPU time) for several problems, specified as previously by the network and the 
type of interfaces considered, and also by the type of the constraint matrix R. In the optimal sampling 
problem, the matrix R usually depends on the volume of traffic observed at each router (cf. |SGB10j ). 
We simulated this data from a uniform distribution, a lognormal distribution, or we used real traffic 
loads. To see the effect of the number of constraints, we also generated arbitrary constraints matrices 
of different sizes. 

In comparison to the SDP, the computation time can be reduced by a factor in the order of 1000 
on the instances from the clustered network. Moreover, the SOCP approach is able to handle huge 
instances arising from the Opentransit network (in which m = 13456). 
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Links (s = 436) 


Constraints 

SOCP 
SDP 


R: 4 X 31 
(arbitrary) 
0.141 
350.63 


R: 31 X 133 
(uniform traffic) 
0.462 
451.69 


R: 130 X 133 
(arbitrary) 
1.135 
430.71 


R: 12 X 436 
(arbitrary) 

23.32 

failed 
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(real traffic) 

187.59 

failed 



Table 3: Computation time (s) for different instances of c— optimal design arising from an optimal monitoring problem 
in IP networks (with multiple constraints Rw < b). 
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Appendix 



A. Proofs of Theorems 17.11 and 17.31 

We start with the foUowing lemma, where we show that the S'/3— optimal design problem can be 
formulated as a convex optimization problem with SDP constraints: 

Lemma A.l. The optimal variable w* of the following convex optimization problem also minimizes 
the Sf^ — criterion. The value of this program coincides with the value of its dual, which we give below: 

r 

min Sq{w) = min — > /3i.logri. (Pa — SDP) 

- k=i 



s 



i=l 



max Pk log ^-l^ (D^ - SDP) 



k=l ' 



^trace(v4(fc),i Zj, v4j,)^i) < 1, Vi e [s] 



k=l 



Proof. As in the proof of Theorem 5A, we reexpress the variance of the k^^ quantity of interest 
Cfc-^M(fc)(iy)~Cfc with the help of a generalized Schur complement (for an arbitrary design w): 

[ck^M(k){w)~Ch) ^ = max = max 

y 0. M(fc)(it») y TkCkcJ. 



max 




M(k){w) 






l/Tk 



As pointed out in the proof of Theorem 5.1 , we can always assume that > if we exclude the trivial 
case Cfc = 0, so that the latter expression is well defined. Now, by monotonicity of the log function, 
we can write: 

r 

min Sfi{w) = - max log (cfc^M(fc)(iy)"Cfc) ^ 

T 

= - max y~] (3k log Tk 

k=l 

Mi^k){w) y TkCkcJ, yk G [r], 

which is exactly Problem {Pp — SDP). It is clear that Problem {Df^ — SDP) is convex and strictly 
feasible, so that the Slater condition is fulfilled, and strong duality holds. It remains to show that 
Problem (D^ — SDP) is indeed the dual of — SDP). To this end, let us form the Lagrangian of 
Problem (P^ - SDP): 

r r s 

£((t, w), {Z, A)) = - ^/3fclogrfe + ^(^fc, r^CfcCfc^ - M^k)iw)) + X{Y,Wi - 1). 

k=l k=l i=l 
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The Lagrange dual function is given by 

g{Z,\):= min £((r, (Z, A)) 

x>0, w>0 



-A + ^min(rfcCfc^ZfcCfc - /5fclogrfc) + y^min Wi{\ - S^iAT^. ^ Ai^k),uZk)). 

k i k 

VA; ( 

-oo otherwise. 



Note that in the above expression, the minimum over is attained for = J^^^ ; and this equation 
must be satisfied by the optimal variables and Z^.. Since we observed that strong duality holds, the 
value of the dual optimization problem must be equal to the value of the primal, and so the optimal 
variables (denoted with stars in superscript) satisfy: 

- E /^'^ < = + E /^'^(l - —F^^ ^ A* = E = 1- 

fc=l k ^k^^ k=l 

We can now make the dual problem explicit: 

Ck^ZkCk 



max q(Z,X)= max q(ZA) = max dulos;— 

k=l ^'^ 



trace (A(fc)^i Z^ ^ffc),i) < 1' ^ M 



fe=i 

This completes the proof of the lemma. □ 

Now, we show that there is a solution of Problem (D^ — SDP) for which every Z^ has rank one, 
thanks to the theoretical result presented in Section [5j 



Proof of Theorem 7J_. We first write the program {D 13 — SDP) in the form of a separable optimization 



problem, by introducing some vectors cxi {i G [s]) of size r, satisfying X]fc=i — 1- 

min Si3{w)= max //fc(aifc, • • • , a^fc) 



,fc=l 



k=l 

where we have set 



\/k e [r], fk{y) = max /3fclog 



trace (v4(fc),j Z^ AJ.)^^ < y^, Vi G [s]. 
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By use of Theorem (5.2) (and monotonicity of the log function), the minimization problem over 
Zk in /fc(-) has a rank-one solution [Zk = h^hh^), and we obtain: 

fk{aik, . . . , ask) = max /3k log 

\\A{k),i hk\\ < v^Oifc, G [s]. 

Now, we use the associativity of the maximum to reformulate the S^— optimum design problem: 



min Sf3{w) = max y ^ j3k log 

r 

^p(fc),, ^fcf < 1, Vie[s] 



k=\ 



Finally, we make the change of variable = hy.y/^ in order to obtain the desired optimization 
problem, that is {Df^). It remains to show that Problem (P^) is the dual of (Df^). The convex 
problem (P/j) is strictly feasible, so that Slater condition is fulfilled, and strong duality holds. 
We will now dualize Problem (P^). This part of the proof is very similar to the dualization of 
Problem {Df^ — SDP) of the previous lemma. We include it here, though, for the reader's convenience. 



In the sequel, we denote by Vi the concatenation of the vectors Vik. Vi = [vn , . . . ,Vi 



prl 



and by 



f3 the vector containing /3fc entries arranged in blocks of length I: (3 = /3r, ... , PrY ^ 

WK We also use the symbol © to denote the Hadamard product of vectors (elementwise product). 
With this notation, we can write: 



/3 QVi 



We denote by V the family of vectors (i'ifc)iG[s], fce[r] and by H the family of vectors (^fc)ie[s]- Now, 
let us form the Lagrangian 



£((t, V,w;), {H,fj,,X)) = J^-/3fclogtfc + ^hJ{tkCk -^A[f^^^^Vik] 



(A.l) 



k=l 



k=l 



i=l 



1 /2 



1=1 



i=l 



The Lagrange dual function is given by 

g{H, ^, A) := min C{{t, V, w), {H, fi, A)) 

t,V, w 



-A + y^mm{tkhk^Ck - (3k log 4) + mm Wi{\ - fii 

k=l 
s 

E~l/2 rp 
min(/ii||/3 fill - Zi Vi 
7I_- 



1 = 1 



1 = 1 
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where we have defined the vectors zi^ := [hi^ Aj-^-^ ...,hr'^ Aj^.^ ■] G WK In the latter equation, the 

minimum over tk is finite if and only if c^^hk > 0, and is attained for tk = ck^^hk ' expression 
in Wi is bounded from below (by 0) if and only if /ij = A. The reader can also verify that the 

1/2 

minimization with respect to Vi is unbounded whenever ||/3 > /ij, and takes the value 

1/2 -1/2 

otherwise. The Cauchy Schwarz inequality between the vectors (3 Qzi and f3 Qvi shows indeed 

1 1/2 

that the minimum is attained for a vector such that Vi is proportional to /3 02^ if ||/3 0Zi|| = /ij, 

1/2 

and for i;^ = if \\(3 Zi\\ < /ij. To summarize, 



-A + ELi - log if V2 e [s], /ii = A and ||/3 < /i^ 

— oo otherwise. 



Now, since the primal and the dual share the same optimal value (we observed that strong duality 
holds), it follows that the optimal variables (denoted with stars in superscript) satisfy 



g{H*, A*) = -X* + J2 Pk{l - log -4^) = E -^'^ l^g*^^ 



k=l k=l 

Combining this equality with the stationarity equations t^. = ^^y^^, and /x* = A*, we obtain: 



fc=i 



We can now make the dual of (P^) explicit: 



min Sgiw) = 2max> /3felog ^ 



A(i)^ihi \ 



A(^r),ihr j 



11/3 '^'0 2i|| < 1, G [s]. 



This program is the same as (-D/?), and it completes the proof of Theorem 7.1 



□ 



Now, we can write that a design is optimal if and only if Karush Kuhn Tucker (KKT) optimality 



conditions hold for problem (P^). In fact, we show in Theorem 7.3 that these KKT conditions 
are equivalent to a geometric characterization of S^j— optimality, which generalizes the theorem of 
Dette |Det93j to the case of multiresponse experiments. 



Proof of Theorem 7^, Since strong duality holds between Problems (P^) and (Df^), the Karush Kuhn 



Tucker (KKT) conditions characterize the optimal variables. We sum up the KKT conditions here. 
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which stem from the duahzation step of the proof of Theorem 7.1 

s 

(Feasibihty) tfcCfc = ^ Af^-^^^ Vik 

i=l 

s 

^^i;, = 1 



2=1 



, ni 1 \ /II ri^l"^ II \ /-> (since /ii = l) ,, 

(Lomp. blackness) A*t(IIP ©I'ill — = =^ Wi = ||p Q Vi 



(Stationarity) Pk = tkhk^Ck 



1/2 

11/3 < 1 and Vi = 



if Wi = 



1/2 1 

11/3 = 1 and Vi = Wi (3 Q Zi otherwise. 



(A.2) 

(A.3) 

(A.4) 
(A.5) 

(A.6) 



Now, let (t, V, w) and H = [hi, hr]"^ be a pair of primal and dual solutions of Problem [Pfi)-{D f^): 
they satisfy KKT equations(A.2)-(A.6). We set = —Vi whenever 7^ and = G otherwise, 
so that (A.4) implies 

■r 

Vi e [s], ^PkWf^ikW^ = Wi<l 

k=l 



and (A.2) implies 



i=l 



i=l 



These relations are nothing but conditions (i) and (ii) of Theorem (7.3). Clearly, the stationarity 
equation (A.5) is the same as condition (iv) of Theorem (7.3). It remains to show that {in) holds. Let 
D be an arbitrary matrix from Vp: when the regression region is A* = [s], there exists a vector ck in 

"1/2 

the unit simplex of W as well as vectors {6i := [<5ii"^, . . . , 6ir'^]^ G ]R''')jg[s] satisfying ||/3 Q Si\\ < 1 
such that 

s I <5ii^^(i),i ^ 

We now prove that H = [hi, hr]'^ is the direction of the supporting hyperplane of P^: 

{D,H) = aiSik^A(^k),ihk 

i,k 
s 



EC T 

1=1 

i=l 

s 

<$^a. <1, 
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where the inequahty is Cauchy-Schwarz, and we have used the stationarity condition (A.6). Finally, 
(iii) holds since Diag(t)C lies on the boundary of V/s: 

r 

{Dmg{t)C,H) = Y^^ucjh^ = = 1. 

k=l k 

Conversely, assume that conditions {i) — {iv) hold. We set Vi = WiSi, and we show that (t, V, w) and 
H satisfy the KKT equations( A.2[ )-( [A.6 ). As in the direct part of this proof, it is straightforward to 
show that the stationarity equation (A. 5) holds, as well as the feasibility condition (A. 2). 
Let us now define the vector Zi as in (-D^): 

Condition {Hi) states that for all vector a in the unit simplex of W, and for all vectors {5i G ]R''')jg[s 

~l/2 

satisfying \\(3 <5i|| < 1, we have 



ik 



/3 ' &z. 



In particular, if ck = is the i unit vector of the canonical basis of M**, and 5i = ?-i/f^ 
obtain: 



we 



ik 



^ (r z,n$-'^' z,) = iir © z^w < i. 



and we have shown the inequality of (A.6). 

The fact that Vi = when = is clear from the way we have defined Vi, and the complementary 
slackness equation (A.4) also holds in this case. 



It remains to show that w is feasible (A. 3) and that (A.6) holds for Wi > 0. Note that (A.6) in turn 
implies the complementary slackness equation (A.4). 

To this end, we write: 



k=i 



k=l 



ik 

s 

= ^ WiSi^Zi 

i=l 

-1/2 x'r/;;-l/2 

i=l 

,, ;;l/2 ,, ,, ;;-l/2 

Wi\\(3 QsiW 11/3 2, 



i=l 
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The latter inequality is Cauchy-Schwarz, and it provides an upper bound which is the (weighted) 
mean of terms all smaller than 1. We can therefore write 

s 

5^^,11/3'^' 6,11 ||/3"'^'0 2i|| = 1, (A.7) 

i=l 

and the Cauchy-Schwarz inequality must be an equality whenever Wi ^ 0, which occurs if and only if 

~l/2 1/2 

(3 © is proportional to (3 z,. Finally, we must have = 1, so that w is feasible (A. 3), 

and each positively weighted term in the sum (A.7) must be 1: 

[ 11/3 QziW = 1 

-1/2 

These two norm constraints further force the coefficient of proportionality between (3 Q ei and 
^ ^ to be 1, so that = /3 Zi, and Vi = WiP Zi, which completes the proof. □ 
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